{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Retrievers and Node Post-Processors\n",
    "\n",
    "In this notebook, we cover some customization to our existing retrieval process, using the `HierarchicalNodeParser`, `AutoMergingRetriever`, \n",
    "and a custom node-postprocessor that ensures a certain amount of tokens are always sent to the LLM."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Setup"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "import openai\n",
    "import os\n",
    "import sys\n",
    "sys.path.append(os.path.join(os.getcwd(), '..'))\n",
    "\n",
    "os.environ['OPENAI_API_KEY'] = \"YOUR_API_KEY\"\n",
    "openai.api_key = os.environ['OPENAI_API_KEY']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/Users/loganmarkewich/llama_docs_bot/venv/lib/python3.9/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
      "  from .autonotebook import tqdm as notebook_tqdm\n"
     ]
    }
   ],
   "source": [
    "from llama_index import ServiceContext, set_global_service_context\n",
    "from llama_index.llms import OpenAI\n",
    "\n",
    "# Use local embeddings + gpt-3.5-turbo-16k\n",
    "service_context = ServiceContext.from_defaults(\n",
    "    llm=OpenAI(model=\"gpt-3.5-turbo-16k\", max_tokens=512, temperature=0.1),\n",
    "    embed_model=\"local:BAAI/bge-base-en\"\n",
    ")\n",
    "\n",
    "set_global_service_context(service_context)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Node Parsing + Retrieval\n",
    "\n",
    "Previously, we used a custom markdown loader to load chunks from our markdown documentation. However, since then, advancements have been made in llama-index that may provide more relevant retrieval. Specifically, we will use the `HierarchicalNodeParser`, which parses nodes into several chunk sizes.\n",
    "\n",
    "The idea here is that during retrieval, if a majority of chunks are retrieved that have the same parent chunk, we return the larger parent chunk instead.\n",
    "\n",
    "To support this, we can modify our loading code as shown below:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Loading Helper Function"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index import SimpleDirectoryReader, Document\n",
    "from llama_index.node_parser import HierarchicalNodeParser, SimpleNodeParser, get_leaf_nodes\n",
    "from llama_index.schema import MetadataMode\n",
    "from llama_docs_bot.markdown_docs_reader import MarkdownDocsReader\n",
    "\n",
    "\n",
    "def load_markdown_docs(filepath, hierarchical=True):\n",
    "    \"\"\"Load markdown docs from a directory, excluding all other file types.\"\"\"\n",
    "    loader = SimpleDirectoryReader(\n",
    "        input_dir=filepath, \n",
    "        required_exts=[\".md\"],\n",
    "        file_extractor={\".md\": MarkdownDocsReader()},\n",
    "        recursive=True\n",
    "    )\n",
    "\n",
    "    documents = loader.load_data()\n",
    "\n",
    "    if hierarchical:\n",
    "        # combine all documents into one\n",
    "        documents = [\n",
    "            Document(text=\"\\n\\n\".join(\n",
    "                    document.get_content(metadata_mode=MetadataMode.ALL) \n",
    "                    for document in documents\n",
    "                )\n",
    "            )\n",
    "        ]\n",
    "\n",
    "        # chunk into 3 levels\n",
    "        # majority means 2/3 are retrieved before using the parent\n",
    "        large_chunk_size = 1536\n",
    "        node_parser = HierarchicalNodeParser.from_defaults(\n",
    "            chunk_sizes=[\n",
    "                large_chunk_size, \n",
    "                large_chunk_size // 3,\n",
    "            ]\n",
    "        )\n",
    "\n",
    "        nodes = node_parser.get_nodes_from_documents(documents)\n",
    "        return nodes, get_leaf_nodes(nodes)\n",
    "    else:\n",
    "        node_parser = SimpleNodeParser.from_defaults()\n",
    "        nodes = node_parser.get_nodes_from_documents(documents)\n",
    "        return nodes"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here, we parse each directory into a single giant document, and then chunk into a heirarchy of 2048, 2048 // 3, and 2048 // 9. \n",
    "\n",
    "This means if 2 of 3 child chunks are retrieved, the `AutoMergingRetriever` will replace the nodes with the larger parent chunk.\n",
    "\n",
    "Now, in order for the auto merging to work properly, we will need to set the top-k higher. However, we still want to avoid sending too much text to the LLM for the sake of latency. So here, we also introduce a local re-ranker to limit the amount of returned nodes after merging.\n",
    "\n",
    "### Load/Create Query Engines\n",
    "\n",
    "Let's write a function to build our query engine tools next.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index import VectorStoreIndex,StorageContext, load_index_from_storage\n",
    "from llama_index.query_engine import RetrieverQueryEngine\n",
    "from llama_index.retrievers import AutoMergingRetriever\n",
    "from llama_index.tools import QueryEngineTool, ToolMetadata\n",
    "from llama_index.storage.docstore import SimpleDocumentStore\n",
    "\n",
    "\n",
    "def get_query_engine_tool(directory, description, hierarchical=True, postprocessors=None):\n",
    "    try:\n",
    "        storage_context = StorageContext.from_defaults(\n",
    "            persist_dir=f\"./data_{os.path.basename(directory)}\"\n",
    "        )\n",
    "        index = load_index_from_storage(storage_context)\n",
    "\n",
    "        if hierarchical:\n",
    "            retriever = AutoMergingRetriever(\n",
    "                index.as_retriever(similarity_top_k=6), \n",
    "                storage_context=storage_context\n",
    "            )\n",
    "        else:\n",
    "            retriever = index.as_retriever(similarity_top_k=12)\n",
    "    except:\n",
    "        if hierarchical:\n",
    "            nodes, leaf_nodes = load_markdown_docs(directory, hierarchical=hierarchical)\n",
    "\n",
    "            docstore = SimpleDocumentStore()\n",
    "            docstore.add_documents(nodes)\n",
    "            storage_context = StorageContext.from_defaults(docstore=docstore)\n",
    "\n",
    "            index = VectorStoreIndex(leaf_nodes, storage_context=storage_context)\n",
    "            index.storage_context.persist(persist_dir=f\"./data_{os.path.basename(directory)}\")\n",
    "\n",
    "            retriever = AutoMergingRetriever(\n",
    "                index.as_retriever(similarity_top_k=12), \n",
    "                storage_context=storage_context\n",
    "            )\n",
    "\n",
    "        else:\n",
    "            nodes = load_markdown_docs(directory, hierarchical=hierarchical)\n",
    "            index = VectorStoreIndex(nodes)\n",
    "            index.storage_context.persist(persist_dir=f\"./data_{os.path.basename(directory)}\")\n",
    "\n",
    "            retriever = index.as_retriever(similarity_top_k=12)\n",
    "\n",
    "    query_engine = RetrieverQueryEngine.from_args(\n",
    "        retriever,\n",
    "        node_postprocessors=postprocessors or [],\n",
    "    )\n",
    "\n",
    "    return QueryEngineTool(query_engine=query_engine, metadata=ToolMetadata(name=directory, description=description))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Compare retrievers\n",
    "\n",
    "You'll notice we included some code to enable/disable the hierarchical node parsing. Let's compare results a bit quickly"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "hierarchical_engine = get_query_engine_tool(\n",
    "    \"../docs/core_modules/query_modules\",\n",
    "    \"Useful for information on various query engines and retrievers, and anything related to querying data.\",\n",
    "    hierarchical=True, \n",
    ").query_engine"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "!rm -rf data_*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "base_engine = get_query_engine_tool(\n",
    "    \"../docs/core_modules/query_modules\",\n",
    "    \"Useful for information on various query engines and retrievers, and anything related to querying data.\",\n",
    "    hierarchical=False, \n",
    ").query_engine"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index import QueryBundle\n",
    "hierarchical_nodes = hierarchical_engine.retrieve(QueryBundle(\"How do I setup a query engine?\"))\n",
    "base_nodes = base_engine.retrieve(QueryBundle(\"How do I setup a query engine?\"))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "--- Hierarchical ---\n",
      "\n",
      "File Name: ./docs/core_modules/query_modules/query_engine/usage_pattern.md\n",
      "Content Type: text\n",
      "Header Path: Usage Pattern/Configuring a Query Engine/High-Level API\n",
      "Links:\n",
      "\n",
      "You can directly build and configure a query engine from an index in 1 line of code:\n",
      "\n",
      "File Name: ./docs/core_modules/query_modules/query_engine/usage_pattern.md\n",
      "Content Type: text\n",
      "Header Path: Usage Pattern/Configuring a Query Engine/High-Level API\n",
      "Links:\n",
      "\n",
      "query_engine = index.as_query_engine(\n",
      "    response_mode='tree_summarize',\n",
      "    verbose=True,\n",
      ")\n",
      "\n",
      "File Name: ./docs/core_modules/query_modules/query_engine/usage_pattern.md\n",
      "Content Type: text\n",
      "Header Path: Usage Pattern/Configuring a Query Engine/High-Level API\n",
      "Links:\n",
      "\n",
      "> Note: While the high-level API optimizes for ease-of-use, it does *NOT* expose full range of configurability.See **Response Modes** for a full list of response modes and what they do.File Name: ./docs/core_modules/query_modules/query_engine/usage_pattern.md\n",
      "Content Type: text\n",
      "Header Path: Usage Pattern/Configuring a Query Engine/High-Level API\n",
      "Links:\n",
      "\n",
      "---\n",
      "maxdepth: 1\n",
      "hidden:\n",
      "---\n",
      "response_modes.md\n",
      "streaming.md\n",
      "\n",
      "File Name: ./docs/core_modules/query_modules/query_engine/usage_pattern.md\n",
      "Content Type: text\n",
      "Header Path: Usage Pattern/Configuring a Query Engine/Low-Level Composition API\n",
      "Links:\n",
      "\n",
      "You can use the low-level composition API if you need more granular control.Concretely speaking, you would explicitly construct a `QueryEngine` object instead of calling `index.as_query_engine(.)`.> Note: You may need to look at API references or example notebooks.File Name: ./docs/core_modules/query_modules/query_engine/usage_pattern.\n",
      "---\n",
      "ipynb\n",
      "/examples/query_engine/SQLAutoVectorQueryEngine.ipynb\n",
      "/examples/query_engine/SQLJoinQueryEngine.ipynb\n",
      "/examples/index_structs/struct_indices/duckdb_sql_query.ipynb\n",
      "Retry Query Engine </examples/evaluation/RetryQuery.ipynb>\n",
      "Retry Source Query Engine </examples/evaluation/RetryQuery.ipynb>\n",
      "Retry Guideline Query Engine </examples/evaluation/RetryQuery.ipynb>\n",
      "/examples/query_engine/citation_query_engine.ipynb\n",
      "/examples/query_engine/pdf_tables/recursive_retriever.ipynb\n",
      "```\n",
      "\n",
      "File Name: ./docs/core_modules/query_modules/query_engine/modules.md\n",
      "Content Type: code\n",
      "Header Path: Module Guides/Experimental\n",
      "\n",
      "```{toctree}\n",
      "---\n",
      "maxdepth: 1\n",
      "---\n",
      "/examples/query_engine/flare_query_engine.ipynb\n",
      "```\n",
      "\n",
      "File Name: ./docs/core_modules/query_modules/query_engine/response_modes.md\n",
      "Content Type: text\n",
      "Header Path: Response Modes\n",
      "\n",
      "Right now, we support the following options:\n",
      "- `default`: \"create and refine\" an answer by sequentially going through each retrieved `Node`; \n",
      "    This makes a separate LLM call per Node.Good for more detailed answers.- `compact`: \"compact\" the prompt during each LLM call by stuffing as \n",
      "    many `Node` text chunks that can fit within the maximum prompt size.If there are \n",
      "    too many chunks to stuff in one prompt, \"create and refine\" an answer by going through\n",
      "    multiple prompts.- `tree_summarize`: Given a set of `Node` objects and the query, recursively construct a tree \n",
      "    and return the root node as the response.Good for summarization purposes.\n",
      "---\n",
      ")\n",
      "query_engine = graph.as_query_engine(custom_query_engines=custom_query_engines)\n",
      "response = query_engine.query(query_str)\n",
      "\n",
      "File Name: ../docs/core_modules/query_modules/query_engine/advanced/query_transformations.md\n",
      "Content Type: text\n",
      "Header Path: Query Transformations/query\n",
      "Links:\n",
      "\n",
      "Check out our example notebook for a full walkthrough.File Name: ../docs/core_modules/query_modules/query_engine/advanced/query_transformations.md\n",
      "Content Type: text\n",
      "Header Path: Query Transformations/Multi-Step Query Transformations\n",
      "Links:\n",
      "\n",
      "Multi-step query transformations are a generalization on top of existing single-step query transformation approaches.Given an initial, complex query, the query is transformed and executed against an index.The response is retrieved from the query.Given the response (along with prior responses) and the query, followup questions may be asked against the index as well.This technique allows a query to be run against a single knowledge source until that query has satisfied all questions.An example image is shown below.!\n",
      "---\n",
      "/docs/core_modules/query_modules/query_engine/advanced/query_transformations.md\n",
      "Content Type: text\n",
      "Header Path: Query Transformations/run query with HyDE query transform\n",
      "Links:\n",
      "\n",
      "query_str = \"what did paul graham do after going to RISD\"\n",
      "hyde = HyDEQueryTransform(include_original=True)\n",
      "query_engine = index.as_query_engine()\n",
      "query_engine = TransformQueryEngine(query_engine, query_transform=hyde)\n",
      "response = query_engine.query(query_str)\n",
      "print(response)\n",
      "\n",
      "File Name: ./docs/core_modules/query_modules/query_engine/advanced/query_transformations.md\n",
      "Content Type: text\n",
      "Header Path: Query Transformations/run query with HyDE query transform\n",
      "Links:\n",
      "\n",
      "Check out our example notebook for a full walkthrough.File Name: ./docs/core_modules/query_modules/query_engine/advanced/query_transformations.md\n",
      "Content Type: text\n",
      "Header Path: Query Transformations/Single-Step Query Decomposition\n",
      "Links:\n",
      "\n",
      "Some recent approaches (e.g.self-ask, ReAct) have suggested that LLM's \n",
      "perform better at answering complex questions when they break the question into smaller steps.We have found that this is true for queries that require knowledge augmentation as well.If your query is complex, different parts of your knowledge base may answer different \"subqueries\" around the overall query.Our single-step query decomposition feature transforms a **complicated** question into a simpler one over the data collection to help provide a sub-answer to the original question.This is especially helpful over a composed graph.Within a composed graph, a query can be routed to multiple subindexes, each representing a subset of the overall knowledge corpus.Query decomposition allows us to transform the query into a more suitable question over any given index.An example image is shown below.!\n",
      "---\n",
      "- `no_text`: Only runs the retriever to fetch the nodes that would have been sent to the LLM, \n",
      "    without actually sending them.Then can be inspected by checking `response.source_nodes`.The response object is covered in more detail in Section 5.- `accumulate`: Given a set of `Node` objects and the query, apply the query to each `Node` text\n",
      "    chunk while accumulating the responses into an array.Returns a concatenated string of all\n",
      "    responses.Good for when you need to run the same query separately against each text\n",
      "    chunk.See Response Synthesizer to learn more.File Name: ./docs/core_modules/query_modules/query_engine/root.md\n",
      "Content Type: text\n",
      "Header Path: Query Engine/Concept\n",
      "Links:\n",
      "\n",
      "Query engine is a generic interface that allows you to ask question over your data.A query engine takes in a natural language query, and returns a rich response.It is most often (but not always) built on one or many Indices via Retrievers.You can compose multiple query engines to achieve more advanced capability.File Name: ./docs/core_modules/query_modules/query_engine/root.md\n",
      "Content Type: text\n",
      "Header Path: Query Engine/Concept\n",
      "Links:\n",
      "\n",
      "If you want to have a conversation with your data (multiple back-and-forth instead of a single question & answer), take a look at Chat Engine\n",
      "\n",
      "File Name: ./docs/core_modules/query_modules/query_engine/root.md\n",
      "Content Type: text\n",
      "Header Path: Query Engine/Usage Pattern\n",
      "Links:\n",
      "\n",
      "Get started with:\n",
      "\n",
      "File Name: ./docs/core_modules/query_modules/query_engine/root.md\n",
      "Content Type: text\n",
      "Header Path: Query Engine/Usage Pattern\n",
      "Links:\n",
      "\n",
      "query_engine = index.as_query_engine()\n",
      "response = query_engine.query(\"Who is Paul Graham.\n",
      "---\n",
      "]\n",
      "\n",
      "query_engine = index.as_query_engine()\n",
      "chat_engine = CondenseQuestionChatEngine.from_defaults(\n",
      "    query_engine=query_engine, \n",
      "    condense_question_prompt=custom_prompt,\n",
      "    chat_history=custom_chat_history,\n",
      "    verbose=True\n",
      ")\n",
      "\n",
      "File Name: ../docs/core_modules/query_modules/chat_engines/usage_pattern.md\n",
      "Content Type: text\n",
      "Header Path: Usage Pattern/Configuring a Chat Engine/Streaming\n",
      "Links:\n",
      "\n",
      "To enable streaming, you simply need to call the `stream_chat` endpoint instead of the `chat` endpoint.File Name: ../docs/core_modules/query_modules/chat_engines/usage_pattern.md\n",
      "Content Type: text\n",
      "Header Path: Usage Pattern/Configuring a Chat Engine/Streaming\n",
      "Links:\n",
      "\n",
      "This somewhat inconsistent with query engine (where you pass in a `streaming=True` flag).We are working on making the behavior more consistent!File Name: ../docs/core_modules/query_modules/chat_engines/usage_pattern.md\n",
      "Content Type: code\n",
      "Header Path: Usage Pattern/Configuring a Chat Engine/Streaming\n",
      "\n",
      "```python\n",
      "chat_engine = index.as_chat_engine()\n",
      "streaming_response = chat_engine.stream_chat(\"Tell me a joke.\n",
      "Total length: 2365\n"
     ]
    }
   ],
   "source": [
    "from llama_index.utils import globals_helper\n",
    "from llama_index.schema import MetadataMode\n",
    "\n",
    "print(\"\\n--- Hierarchical ---\\n\")\n",
    "print('\\n---\\n'.join([node.node.text for node in hierarchical_nodes]))\n",
    "\n",
    "total_length = 0\n",
    "for node in hierarchical_nodes:\n",
    "    total_length += len(globals_helper.tokenizer(node.node.get_content(metadata_mode=MetadataMode.LLM)))\n",
    "print(f\"Total length: {total_length}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "--- Base ---\n",
      "\n",
      "Query engine is a generic interface that allows you to ask question over your data.\n",
      "\n",
      "A query engine takes in a natural language query, and returns a rich response.\n",
      "It is most often (but not always) built on one or many Indices via Retrievers.\n",
      "You can compose multiple query engines to achieve more advanced capability.\n",
      "---\n",
      "You can use the low-level composition API if you need more granular control.\n",
      "Concretely speaking, you would explicitly construct a `QueryEngine` object instead of calling `index.as_query_engine(...)`.\n",
      "> Note: You may need to look at API references or example notebooks.\n",
      "---\n",
      "To enable streaming, you need to use an LLM that supports streaming.\n",
      "Right now, streaming is supported by `OpenAI`, `HuggingFaceLLM`, and most LangChain LLMs (via `LangChainLLM`).\n",
      "\n",
      "Configure query engine to use streaming:\n",
      "\n",
      "If you are using the high-level API, set `streaming=True` when building a query engine.\n",
      "---\n",
      "vector_query_engine = vector_index.as_query_engine()\n",
      "vector_query_engine = TransformQueryEngine(\n",
      "    vector_query_engine, \n",
      "    query_transform=decompose_transform\n",
      "    transform_extra_info={'index_summary': vector_index.index_struct.summary}\n",
      ")\n",
      "custom_query_engines = {\n",
      "    vector_index.index_id: vector_query_engine\n",
      "}\n",
      "---\n",
      "step_decompose_transform = StepDecomposeQueryTransform(\n",
      "    llm_predictor, verbose=True\n",
      ")\n",
      "\n",
      "query_engine = index.as_query_engine()\n",
      "query_engine = MultiStepQueryEngine(query_engine, query_transform=step_decompose_transform)\n",
      "\n",
      "response = query_engine.query(\n",
      "    \"Who was in the first batch of the accelerator program the author started?\",\n",
      ")\n",
      "print(str(response))\n",
      "---\n",
      "You can directly build and configure a query engine from an index in 1 line of code:\n",
      "---\n",
      "```{toctree}\n",
      "---\n",
      "maxdepth: 1\n",
      "---\n",
      "/examples/query_engine/RouterQueryEngine.ipynb\n",
      "/examples/query_engine/RetrieverRouterQueryEngine.ipynb\n",
      "/examples/query_engine/JointQASummary.ipynb\n",
      "/examples/query_engine/sub_question_query_engine.ipynb\n",
      "/examples/query_transformations/SimpleIndexDemo-multistep.ipynb\n",
      "/examples/query_engine/SQLRouterQueryEngine.ipynb\n",
      "/examples/query_engine/SQLAutoVectorQueryEngine.ipynb\n",
      "/examples/query_engine/SQLJoinQueryEngine.ipynb\n",
      "/examples/index_structs/struct_indices/duckdb_sql_query.ipynb\n",
      "Retry Query Engine </examples/evaluation/RetryQuery.ipynb>\n",
      "Retry Source Query Engine </examples/evaluation/RetryQuery.ipynb>\n",
      "Retry Guideline Query Engine </examples/evaluation/RetryQuery.ipynb>\n",
      "/examples/query_engine/citation_query_engine.ipynb\n",
      "/examples/query_engine/pdf_tables/recursive_retriever.ipynb\n",
      "```\n",
      "---\n",
      "query_engine = RetrieverQueryEngine(\n",
      "    retriever=retriever,\n",
      "    response_synthesizer=response_synthesizer,\n",
      ")\n",
      "---\n",
      "If you want to have a conversation with your data (multiple back-and-forth instead of a single question & answer), take a look at Chat Engine\n",
      "---\n",
      "Build a query engine from index:\n",
      "---\n",
      "```{toctree}\n",
      "---\n",
      "maxdepth: 1\n",
      "---\n",
      "Retriever Query Engine </examples/query_engine/CustomRetrievers.ipynb>\n",
      "```\n",
      "---\n",
      "from llama_index import get_response_synthesizer\n",
      "synth = get_response_synthesizer(streaming=True, ...)\n",
      "query_engine = RetrieverQueryEngine(response_synthesizer=synth, ...)\n",
      "Total length: 1465\n"
     ]
    }
   ],
   "source": [
    "print(\"\\n--- Base ---\\n\")\n",
    "print('\\n---\\n'.join([node.node.text for node in base_nodes]))\n",
    "\n",
    "total_length = 0\n",
    "for node in base_nodes:\n",
    "    total_length += len(globals_helper.tokenizer(node.node.get_content(metadata_mode=MetadataMode.LLM)))\n",
    "print(f\"Total length: {total_length}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As you can see, the hierarchical query engine seems to return better text, but there is also a LOT of text.\n",
    "\n",
    "If not enough nodes are merged in the retriever, we can end up with a lot of text, due to setting the top-k so high.\n",
    "\n",
    "So, let's write a custom node-postprocessor to make sure this doesn't happen!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Custom Node Post-Processor\n",
    "\n",
    "Here, we use a very basic approach to approximate token counts. We return the most nodes that fit within our token count.\n",
    "The nodes are already pre-sorted, so we don't have to worry about similarity scores here."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "from typing import Callable, Optional\n",
    "\n",
    "from llama_index.utils import globals_helper\n",
    "from llama_index.schema import MetadataMode\n",
    "\n",
    "class LimitRetrievedNodesLength:\n",
    "\n",
    "    def __init__(self, limit: int = 3000, tokenizer: Optional[Callable] = None):\n",
    "        self._tokenizer = tokenizer or globals_helper.tokenizer\n",
    "        self.limit = limit\n",
    "\n",
    "    def postprocess_nodes(self, nodes, query_bundle):\n",
    "        included_nodes = []\n",
    "        current_length = 0\n",
    "\n",
    "        for node in nodes:\n",
    "            current_length += len(self._tokenizer(node.node.get_content(metadata_mode=MetadataMode.LLM)))\n",
    "            if current_length > self.limit:\n",
    "                break\n",
    "            included_nodes.append(node)\n",
    "\n",
    "        return included_nodes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n",
      "To disable this warning, you can either:\n",
      "\t- Avoid using `tokenizers` before the fork if possible\n",
      "\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n"
     ]
    }
   ],
   "source": [
    "!rm -rf data_*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "query_engine = get_query_engine_tool(\n",
    "    \"../docs/core_modules/query_modules\",\n",
    "    \"Useful for information on various query engines and retrievers, and anything related to querying data.\",\n",
    "    hierarchical=True,\n",
    "    postprocessors=[LimitRetrievedNodesLength(limit=3000)]\n",
    ").query_engine"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Total length: 2971\n"
     ]
    }
   ],
   "source": [
    "hierarchical_nodes = query_engine.retrieve(QueryBundle(\"How do I setup a query engine?\"))\n",
    "total_length = 0\n",
    "for node in hierarchical_nodes:\n",
    "    total_length += len(globals_helper.tokenizer(node.node.get_content(metadata_mode=MetadataMode.LLM)))\n",
    "print(f\"Total length: {total_length}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Final Query Engine"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "With our functions setup, we can load/create our indexes and create our final query engine across our documentation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "metadata": {},
   "outputs": [],
   "source": [
    "import nest_asyncio\n",
    "nest_asyncio.apply()\n",
    "\n",
    "from llama_index.query_engine import SubQuestionQueryEngine, RouterQueryEngine\n",
    "\n",
    "# Here we define the directories we want to index, as well as a description for each\n",
    "# NOTE: these descriptions are hand-written based on my understanding. We could have also\n",
    "# used an LLM to write these, maybe a future experiment.\n",
    "docs_directories = {\n",
    "    \"../docs/community\": \"Useful for information on community integrations with other libraries, vector dbs, and frameworks.\", \n",
    "    \"../docs/core_modules/agent_modules\": \"Useful for information on data agents and tools for data agents.\", \n",
    "    \"../docs/core_modules/data_modules\": \"Useful for information on data, storage, indexing, and data processing modules.\",\n",
    "    \"../docs/core_modules/model_modules\": \"Useful for information on LLMs, embedding models, and prompts.\",\n",
    "    \"../docs/core_modules/query_modules\": \"Useful for information on various query engines and retrievers, and anything related to querying data.\",\n",
    "    \"../docs/core_modules/supporting_modules\": \"Useful for information on supporting modules, like callbacks, evaluators, and other supporting modules.\",\n",
    "    \"../docs/getting_started\": \"Useful for information on getting started with LlamaIndex.\", \n",
    "    \"../docs/development\": \"Useful for information on contributing to LlamaIndex development.\",\n",
    "}\n",
    "\n",
    "# Build query engine tools\n",
    "query_engine_tools = [\n",
    "    get_query_engine_tool(\n",
    "        directory, \n",
    "        description, \n",
    "        hierarchical=True, \n",
    "        postprocessors=[LimitRetrievedNodesLength(limit=3000)]\n",
    "    ) for directory, description in docs_directories.items()\n",
    "]\n",
    "\n",
    "# build top-level router -- this will route to multiple sub-indexes and aggregate results\n",
    "# query_engine = SubQuestionQueryEngine.from_defaults(\n",
    "#     query_engine_tools=query_engine_tools,\n",
    "#     service_context=service_context,\n",
    "#     verbose=False\n",
    "# )\n",
    "\n",
    "query_engine = RouterQueryEngine.from_defaults(\n",
    "    query_engine_tools=query_engine_tools,\n",
    "    service_context=service_context,\n",
    "    select_multi=True,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 61,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "**`Final Response:`** To setup a ChromaDB Vector Store, you can use the following code sample:\n",
       "\n",
       "```python\n",
       "import chromadb\n",
       "from llama_index.vector_stores import ChromaVectorStore\n",
       "\n",
       "chroma_client = chromadb.Client()\n",
       "chroma_collection = chroma_client.create_collection(\"quickstart\")\n",
       "\n",
       "vector_store = ChromaVectorStore(\n",
       "    chroma_collection=chroma_collection,\n",
       ")\n",
       "```\n",
       "\n",
       "This code imports the necessary libraries, creates a ChromaDB client, and then creates a collection called \"quickstart\". Finally, it initializes a ChromaVectorStore using the created collection."
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "from llama_index.response.notebook_utils import display_response\n",
    "response = query_engine.query(\"How do I setup a ChromaDB Vector Store? Give me a code sample please.\")\n",
    "display_response(response)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 62,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "**`Final Response:`** To customize Document objects, you can include useful metadata using the `metadata` dictionary on each document. Additionally, you can customize the embedding metadata text by setting the `excluded_embed_metadata_keys` attribute to exclude specific metadata keys from being included in the embedding model. You can also customize the format of the metadata using attributes such as `metadata_separator` and `metadata_template`. Furthermore, you can pass in a service context to specific parts of the pipeline to override the default configuration. This allows you to set different components such as the LLM, embedding model, node parser, and prompt helper according to your requirements, thereby tailoring the behavior of the Document objects to suit your needs."
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "response = query_engine.query(\"How can I customize Document objects?\")\n",
    "display_response(response)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 63,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "**`Final Response:`** You can customize Document metadata in a few ways. \n",
       "\n",
       "First, you can exclude specific metadata keys from being visible to the LLM (Language Model) by using the `excluded_llm_metadata_keys` attribute. This allows you to exclude certain metadata from being read by the LLM during response synthesis.\n",
       "\n",
       "Second, you can exclude metadata keys from being visible to the embedding model by using the `excluded_embed_metadata_keys` attribute. This is useful if you don't want certain text to bias the embeddings.\n",
       "\n",
       "Additionally, you can customize the format of the metadata using the following attributes:\n",
       "- `metadata_seperator`: controls the separator between each key/value pair of the metadata.\n",
       "- `metadata_template`: controls how each key/value pair is formatted.\n",
       "- `text_template`: controls how the metadata is joined with the text content of the document.\n",
       "\n",
       "You can set the metadata dictionary in the document constructor or after the document is created. You can also set the filename automatically using the `SimpleDirectoryReader` and `file_metadata` hook.\n",
       "\n",
       "Overall, customizing Document metadata allows you to control what metadata is visible to the LLM and embedding model, as well as the format of the metadata."
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "response = query_engine.query(\"How can I customize Document metadata?\")\n",
    "display_response(response)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Conclusion\n",
    "\n",
    "Here' we covered a ton of concepts\n",
    "- Node Parsing and Retrievers, specifically the `AutoMergingRetriever` and `HierarchicalNodeParser`\n",
    "- Node post-processors and custom node-postprocessing\n",
    "- Reviewing setting up a `RouterQueryEngine`\n",
    "\n",
    "The full code is available in the `llama_docs_bot` folder in the repo!"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.6"
  },
  "orig_nbformat": 4
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
